Goto

Collaborating Authors

 github project


GitHub's Deepfake Porn Crackdown Still Isn't Working

WIRED

In late November, a deepfake porn maker claiming to be based in the US uploaded a sexually explicit video to the world's largest site for pornographic deepfakes, featuring TikTok influencer Charli D'Amelio's face superimposed onto a porn performer's body. Despite the influencer presumably playing no role in the video's production, it was viewed more than 8,200 times and captured the attention of other deepfake fans. What program did you use for creating the deepfake??" one user going by the name balascool commented. D'Amelio's agent did not reply to a request for comment. The video's creator, "DeepWorld23," has claimed in the comments that the program was a deepfake model hosted on developer platform GitHub.


An Empirical Study on the Usage of Automated Machine Learning Tools

Majidi, Forough, Openja, Moses, Khomh, Foutse, Li, Heng

arXiv.org Artificial Intelligence

The popularity of automated machine learning (AutoML) tools in different domains has increased over the past few years. Machine learning (ML) practitioners use AutoML tools to automate and optimize the process of feature engineering, model training, and hyperparameter optimization and so on. Recent work performed qualitative studies on practitioners' experiences of using AutoML tools and compared different AutoML tools based on their performance and provided features, but none of the existing work studied the practices of using AutoML tools in real-world projects at a large scale. Therefore, we conducted an empirical study to understand how ML practitioners use AutoML tools in their projects. To this end, we examined the top 10 most used AutoML tools and their respective usages in a large number of open-source project repositories hosted on GitHub. The results of our study show 1) which AutoML tools are mostly used by ML practitioners and 2) the characteristics of the repositories that use these AutoML tools. Also, we identified the purpose of using AutoML tools (e.g. model parameter sampling, search space management, model evaluation/error-analysis, Data/ feature transformation, and data labeling) and the stages of the ML pipeline (e.g. feature engineering) where AutoML tools are used. Finally, we report how often AutoML tools are used together in the same source code files. We hope our results can help ML practitioners learn about different AutoML tools and their usages, so that they can pick the right tool for their purposes. Besides, AutoML tool developers can benefit from our findings to gain insight into the usages of their tools and improve their tools to better fit the users' usages and needs.


Top 10 Trending Open-Source Python Projects on GitHub

#artificialintelligence

Python is the hottest and trendiest programming language in the community of developers and programmers. Meanwhile, GitHub offers an open-source community with Git repositories to more than 73 million developers. Python projects are in high demand to increase the knowledge of the programming language efficiently and GitHub can provide help in that. Open source Python projects are crucial to learning to solve complex problems in the global tech market. Thus, there are innumerable GitHub projects that can help developers to work with open-source Python projects effectively.


The shape of the urine stream

#artificialintelligence

Today we'll learn about Deep Learning and an organ only men have. A documentation inspired me to start a little Deep Learning project with PyTorch which will give beginners some insights on how to start a Deep Learning project. Hope you will enjoy it! The most common type of cancer in Austria since 1994 for men has been proastate cancer [1]. A mediacal examination has to carried out to express an opinion on the healthiness of the prostate or the bladder which can be unpleasant for males.


Predicting long-time contributors for GitHub projects using machine learning

#artificialintelligence

Many organizations develop software systems using open source software (OSS), which is risky due to the high possibility of losing support. Contributors are critical for the survival of OSS projects, but very few new contributors remain with OSS projects to become long-time contributors (LTCs). Identification of factors that contribute to become an LTC can help OSS project owners utilize limited resources to retain new contributors. In this paper, we investigate whether we can effectively predict new contributors to OSS repositories becoming long time contributors based on repository and contributor meta-data collected from GitHub. We construct a dataset containing 70,899 observations from 888 most popular repositories with 56,766 contributors.


Everything you need to know about Github Copilot

#artificialintelligence

I was fortunate enough to be given early access to GitHub's new "AI pair programmer," Copilot, which generates quite a stir. My early ideas and experiences with this tool are shared in this blog post. It's made me shout "wow" a couple of times in the last few hours, which isn't something you'd expect from your developer tools! However, there are some real-world limits to this tool right now, which I'll go through in this article. In summary: Copilot appears out of nowhere, interrupting my flow.


7 Innovative Machine Learning GitHub Repositories in Python

#artificialintelligence

Quite a mix of machine learning projects we have here. I have provided tutorials, guides and resources after each GitHub project. I have one ask – pick the project that interests you, go through the tutorial, and then apply that particular library to solve the problem. For example, you could take up the NeuralClassifier repository and use that to solve a multi-label classification problem. This will help you broaden your understanding of the topic and expand your current skillset.


Autoencoders: Deep Learning with TensorFlow's Eager Execution

#artificialintelligence

Deep Learning has revolutionized the Machine Learning scene in the last years. Can we apply it to image compression? How well can a Deep Learning algorithm reconstruct pictures of kittens? Today we'll find the answers to all of those questions. I've talked about Unsupervised Learning before: applying Machine Learning to discover patterns in unlabelled data.


Autoencoders: Deep Learning with TensorFlow's Eager Execution Data Stuff

#artificialintelligence

Deep Learning has revolutionized the Machine Learning scene in the last years. Can we apply it to image compression? How well can a Deep Learning algorithm reconstruct pictures of kittens? Today we'll find the answers to all of those questions. I've talked about Unsupervised Learning before: applying Machine Learning to discover patterns in unlabelled data.


Import2vec - Learning Embeddings for Software Libraries

Theeten, Bart, Vandeputte, Frederik, Van Cutsem, Tom

arXiv.org Machine Learning

We consider the problem of developing suitable learning representations (embeddings) for library packages that capture semantic similarity among libraries. Such representations are known to improve the performance of downstream learning tasks (e.g. classification) or applications such as contextual search and analogical reasoning. We apply word embedding techniques from natural language processing (NLP) to train embeddings for library packages ("library vectors"). Library vectors represent libraries by similar context of use as determined by import statements present in source code. Experimental results obtained from training such embeddings on three large open source software corpora reveals that library vectors capture semantically meaningful relationships among software libraries, such as the relationship between frameworks and their plug-ins and libraries commonly used together within ecosystems such as big data infrastructure projects (in Java), front-end and back-end web development frameworks (in JavaScript) and data science toolkits (in Python).